Words, Numbers and All That: the Lexicon in Sentence Understanding

نویسنده

  • Suzanne Stevenson
چکیده

1 Cross-disciplinary Issues in Lexical Theories It is hardly a controversial statement that the acquisition and processing of language require knowledge of its words. Yet, the type and use of information encoded in a lexical entry, the relation of words to each other in the lexicon, and the relationship of the lexicon to the grammar, are complex and unsettled issues, on which researchers hold very diierent views. While there is a recent consensus that mechanisms operating in the lexicon are not substantially diierent from those operating in the syntax, there are diierences on whether the syntax is in the lexicon, or the lexicon is in the syntax. On the one view, the lexicon is a static repository of very rich representations, which regulate the composition of words to an extent that goes beyond the phrase, while on the other view the lexicon is dynamically generated as a result of composition and competition mechanisms, largely syntactic in nature. Roughly speaking, computational linguistics and psycholinguistics in general follow the rst view, and the two elds are converging on some similar lexicalised, probabilistic models of grammars. Theoretical linguistics has recently proposed models of the latter type. In computational linguistics, work on the lexicon has stemmed from two diierent areas of research: parsing and grammar formalisms, and construction of electronic databases (lexicography). In the area of parsing, the interest in probabilistic models and lexicalised grammars did not develop simultaneously. Parsers based on probabilistic context-free grammars were motivated by the diiculties in building robust, large-scale systems using the explicit representation of linguistic knowledge. Large corpus annotation eeorts and the creation of tree-banks (text corpora annotated with syntactic structures) enabled researchers to develop and automatically train probabilistic models of syntactic disambiguation (Mar-cus, Santorini, and Marcinkiewicz 1993). In an attempt to take advantage of the insights gained in the area of statistical speech processing, computational linguists initially adopted very simpliied statistical models of grammar and parsing, abandoning the more sophisticated lexicalised feature-based formalisms However, it soon became apparent that the success of probabilistic context-free grammars was limited by the strong (and incorrect) assumption of probabilistic independence of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

First Language Activation during Second Language Lexical Processing in a Sentential Context

 Lexicalization-patterns, the way words are mapped onto concepts, differ from one language      to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...

متن کامل

Our system for annotation of articles is named “Text Detective”

Text Detective is then able to tag every word in the sentence according to biological relevant categories. For instance, chemical compounds are recognized and labelled. The identification of “central words” (also known as “core terms”) is a key step in this process (words such as “receptor”, “kinase”, “transporter”, etc). For this purpose, we have built a lexicon and used some carefully curated...

متن کامل

Iranian EFL Learners’ Lexical Inferencing Strategies at Both Text and Sentence levels

Lexical inferencing is one of the most important strategies in vocabulary learning and it plays an important role in dealing with unknown words in a text. In this regard, the aim of this study was to determine the lexical inferencing strategies used by Iranian EFL learners when they encounter unknown words at both text and sentence levels. To this end, forty lower intermediate students were div...

متن کامل

Non-Literal Word Sense Identification Through Semantic Network Path Schemata

When computer programs disambiguate words in a sentence, they often encounter non-literal or novel usages not included in their lexicon. In a recent study, Georgia Green (personal communication) estimated that 17% to 20% of the content word senses encountered in various types of normal English text are not fisted in the dictionary. While these novel word senses are generally valid, they occur i...

متن کامل

Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words

This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...

متن کامل

Logical Structures in the Lexicon

The lexical entry for a word must contain all the information needed to construct a semantic representation for sentences that contain the word. Because of that requirement, the formats for lexical representations must be as detailed as the semantic forms. Simple representations, such as features and frames, are adequate for resolving many syntactic ambiguities. But since those notations cannot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002